Assessments, such as cognitive and aptitude, are used by organizations and institutes worldwide to evaluate prospective employees and students. These tests claim to provide insights into candidates’ behavior, reasoning, and other skills, and the data collected is used to formulate different decisions. But how do organizations know these tests measure what they claim to measure?
This is where the psychometric properties of a test come in. Psychometric properties identify and define critical aspects of an assessment, such as its suitability or reliability for use in a specific circumstance. For instance, if a test is presented as an appropriate tool for measuring a particular skill set or trait, its psychometric properties will provide test creators and users with sufficient evidence of whether the instrument is what it claims. A good psychometric test must have three fundamental properties: reliability, validity, and norming. When hiring or developing employees, choosing the correct set of assessments is pivotal.
As discussed above, different psychometric properties provide distinct insights into a test’s meaningfulness, appropriateness, and usefulness. Some psychometric characteristics speak about the quality of the whole test, while others give weight to its constituent parts and sections. When considered in totality, the psychometric properties of an assessment could reveal whether the test assesses a single construct or multiple constructs.
Psychometric properties are most often expressed quantitatively. Numerical quantities such as a coefficient or an index represent the property. The awareness of the different psychometric properties of a test ensures that the information gained using it will provide a firm foundation for making the right decisions.
A standardized test is administered and scored in a consistent or standard manner. They are designed to stabilize the questions, conditions for administering, scoring procedures, and interpretations. Standardized testing can consist of true-false, multiple-choice, authentic assessments or essay questions. It’s possible to shape any form of assessment into standardized tests.
Here are the three psychometric characteristics that must be considered when creating or standardizing tests:
Psychometric reliability is the extent to which test scores are accurate. A reliable test score is precise and consistent during all the instances of tests taken. An assessment is considered reliable only if it produces similar results under variable conditions across multiple testing instances, numerous test editions, or multiple raters grading the participant’s responses. Reliability is an essential component of a perfect assessment test.
Over the years, scholars and researchers have uncovered multiple ways to check for psychometric reliability. Some include testing the same participants at different points or presenting the participants with varying versions of the same test to evaluate their consistency levels. An assessment must demonstrate excellent reliability to qualify for validity.
The four types of psychometric reliability are:
1. Parallel forms reliability: The two tests use the same content but use separate procedures or equipment, yielding the same result for each test-taker.
2. Internal consistency reliability: Items within the test are examined to see if they measure what the assessment evaluates. Internal reliability between test items is referred to as internal consistency.
3. Inter-rater reliability: Inter-scorer consistency is high when two raters score the psychometric test similarly.
4. Test-retest reliability: This is when the same test is conducted over time, and the test-taker displays consistency in scores over multiple administrations of the same test.
Validity is the degree to which the test measures what it claims to measure. As per the definition put forward by The Standards for Educational and Psychological Testing (2014), validity is the ‘degree to which evidence and theory support the interpretations of test scores for proposed uses of tests.’
Even though an assessment might be reliable, it may fail to provide the correct measure of the test-takers’ traits if it is not valid. Since the assessor will make decisions about the test takers based on the assessment, the validity inferred from it is crucial. Four types of validity can be measured, and all four should be considered to ensure a test is valid.
Norms refer to a sample of test-takers who represent the intended population for the assessment.
Norming helps the test designer understand the group they are assessing and identify what is considered normal within the target group. For example, a test that is designed to evaluate the coding skills of an experienced programmer in Java and will be used to hire coders with five years of experience will have a norming group comprising Java programmers with five years of experience.
Mercer | Mettl, a global leader in online assessments, takes an exhaustive approach to assessment curation. Data is collected from a sample of more than two thousand respondents (representative sample with different ages, genders, job levels, and education) for norming. At the same time, the validity (convergent) is between 0.4-0.67, and the reliability coefficient is between 0.63-0.73.
Furthermore, Mercer | Mettl assessments adhere to the principles for validating and using personnel selection procedures set by the Society for Industrial and Organizational Psychology (SIOP) and uniform guidelines on employee selection procedures by The Equal Employment Opportunity Commission (EEOC). The tests also meet the Association of Test Publishers (ATP) and the American Psychological Association (APA) guidelines.
Here is how a Mercer | Mettl assessment curation process operates:
Organizations share their requirements, and a dedicated team of in-house subject matter experts works intricately to curate the list of competencies and sub-competencies that need to be measured.
Once the skill or competency framework is sealed, these competencies are mapped to one or more relevant tools for the assessment. These tests range from psychometric tests to aptitude tests for specific industries or skills. The assessments can be in different formats, such as multiple-choice questions (MCQs), situational judgment tests (SJTs), and simulators.
Once the relevant tools for assessment are mapped, the next and most critical step is creating the questions for the test. These questions can be sourced from Mercer | Mettl’s existing question bank of over one million questions covering 3000+ skills.
The assessment platform is fully customizable. It allows organizations to select the difficulty level of the questions, set up the order of questions, allot specific time to each section and much more. They can also send invites to the candidates in bulk or during a selected time slot.
The reports generated are specific to each candidate and offer directional feedback on strengths and areas of improvement. The reports can be shared in formats like PDF or HTML. All the reports are customizable.
When an organization uses an assessment to assess candidates, it must have confidence that the test measures what it is supposed to and is reliable over time. Psychometric properties of an assessment help organizations understand this.
Originally published April 12 2018, Updated June 5 2024
Vaishali has been working as a content creator at Mercer | Mettl since 2022. Her deep understanding and hands-on experience in curating content for education and B2B companies help her find innovative solutions for key business content requirements. She uses her expertise, creative writing style, and industry knowledge to improve brand communications.
Psychometric tests measure an individual’s personality traits and behavioral tendencies to predict job performance. Psychometric assessments gauge cultural fitment, trainability, motivations, preferences, dark characteristics, etc., to hire and develop the right people.
Thanks for submitting the comment. We’ll post the comment once its verified.
Would you like to comment?